🚀 Pipeline Results Report

Generated: /root/verbalize-sampling

Experiments: 5

Metrics: diversity, ngram, joke_quality

📋 Experiment Configurations

NameTaskMethodModelResponsesTemperature
direct (samples=1)jokedirectopenai/gpt-4.1-mini300.7
sequence [strict] (samples=5)jokesequenceopenai/gpt-4.1-mini300.7
multi_turn [strict] (samples=5)jokemulti_turnopenai/gpt-4.1-mini300.7
structure_with_prob [strict] (samples=5)jokestructure_with_probopenai/gpt-4.1-mini300.7
combined [strict] (samples=5)jokecombinedopenai/gpt-4.1-mini300.7

📝 Generation Examples

direct (samples=1)

Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don’t skeletons fight each other? They don’t have the guts.
Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don’t skeletons fight each other? They don’t have the guts.
Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don’t scientists trust atoms? Because they make up everything!

sequence [strict] (samples=5)

Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!

multi_turn [strict] (samples=5)

Method: multi_turnTask: joke
Prompt:
Give me a witty joke that's also clever
Response:
Why don’t scientists trust atoms? Because they make up everything!
Method: multi_turnTask: joke
Prompt:
Give me a witty joke that's also clever
Response:
Why did the mathematician bring a ladder to class? Because they were ready to reach the highest degree!
Method: multi_turnTask: joke
Prompt:
Give me a witty joke that's also clever
Response:
Why do programmers prefer dark mode? Because light attracts bugs!

structure_with_prob [strict] (samples=5)

Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!

combined [strict] (samples=5)

Method: combinedTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: combinedTask: joke
Prompt:
Tell me a joke.
Response:
Sure! Here's a joke that takes a little twist: Why did the time traveler break up with the calendar? Because their days were numbered, and every date was just a repeat! Imagine a world where every day is a rerun—talk about dĂ©jĂ  vu! Time travel jokes are tricky because if you laugh too hard, you might create a paradox and erase the punchline. But seriously, humor about time often reminds us how precious moments are, even if we wish we could jump back and change a few things. So next time you feel stuck in a loop, remember, laughter might just be the key to breaking the cycle—or at least making it more fun!
Method: combinedTask: joke
Prompt:
Tell me a joke.
Response:
Here's a joke that plays with words and a bit of absurdity: Why did the bicycle fall over? Because it was two-tired! Imagine a bicycle, exhausted from all the pedaling, simply collapsing in the middle of a quiet street. It’s funny because we personify the bike, attributing human feelings like tiredness to an inanimate object. This joke not only tickles your funny bone but also sparks a bit of whimsy about everyday objects having secret lives and emotions. Next time you see a bicycle resting against a wall, you might just wonder if it’s catching its breath after a long ride or if it’s just “two-tired” to go on. Wordplay jokes like this are timeless because they create a playful connection between language and imagination, making the mundane amusing in unexpected ways.

📊 Results Summary

Diversity Results

Note: Detailed arrays excluded from table: pairwise_diversities
ExperimentAverage DiversityMin DiversityMax DiversityStd DiversityAverage Response LengthAverage Unique WordsAverage Vocabulary RichnessTotal Cost
direct (samples=1)0.10090.00000.40030.151311.113310.91330.98310.0000
sequence [strict] (samples=5)0.29910.00000.48480.122412.486712.22000.98210.0000
multi_turn [strict] (samples=5)0.26900.00000.43980.118612.433312.21330.98460.0000
structure_with_prob [strict] (samples=5)0.30620.00000.49820.123012.553312.18670.97620.0000
combined [strict] (samples=5)0.25240.00000.49010.0995110.350083.72220.82340.0005

📈 Diversity Visualizations

Aggregate Metrics
Aggregate Metrics
Pairwise Diversities Distribution
Pairwise Diversities Distribution
Response Length Distribution
Response Length Distribution
Unique Words Distribution
Unique Words Distribution
Vocabulary Richness Distribution
Vocabulary Richness Distribution

Ngram Results

Note: Detailed arrays excluded from table: pairwise_rouge_l_scores
ExperimentAverage Rouge LMin Rouge LMax Rouge LStd Rouge LAverage Response Length
direct (samples=1)0.74650.18181.00000.321611.1133
sequence [strict] (samples=5)0.20480.00001.00000.283212.4867
multi_turn [strict] (samples=5)0.29500.00001.00000.274112.4333
structure_with_prob [strict] (samples=5)0.18450.00001.00000.277612.5533
combined [strict] (samples=5)0.12410.00001.00000.1297110.3500

📈 Ngram Visualizations

Aggregate Metrics
Aggregate Metrics
Pairwise Rouge L Scores Distribution
Pairwise Rouge L Scores Distribution
Response Length Distribution
Response Length Distribution

Joke_Quality Results

ExperimentFunninessClevernessOriginalityStructureOverallNormalized Overall
direct (samples=1)3.52673.55332.26674.12673.37130.6743
sequence [strict] (samples=5)3.29333.64672.38674.10673.32230.6645
multi_turn [strict] (samples=5)3.28003.61332.28674.12003.29070.6581
structure_with_prob [strict] (samples=5)3.41333.71332.40674.22673.40900.6818
combined [strict] (samples=5)3.12783.96673.17223.53893.40810.6816

📈 Joke Quality Visualizations

Aggregate Metrics
Aggregate Metrics
Cleverness.Score Distribution
Cleverness.Score Distribution
Funniness.Score Distribution
Funniness.Score Distribution
Originality.Score Distribution
Originality.Score Distribution
Structure.Score Distribution
Structure.Score Distribution

📈 Overall Comparison

📈 Comparison Chart Visualizations

Comparison Chart
Comparison Chart